Navigation Map | Top
Step 1: Quality Control | Top
For the first step, the pipeline will perform quality assessment on the raw fastq files.
The BDS code snippet for the sample KO01 will look like:
$ grep -A1 run.RawReadQC.FastQC.SRR1205282.sh Submit_*.bds
## dep( [ '/CRI/HPC/cri_rnaseq_2018/RNAseq/RawReadQC/KO01/SRR1205282/SRR1205282_fastqc.zip' ] <- [ '/CRI/HPC/cri_rnaseq_2018/example/data/SRR1205282.fastq.gz' ], cpus := 1, mem := 16*G, timeout := 72*hour, taskName := "FastQC.SRR1205282") sys bash /CRI/HPC/cri_rnaseq_2018/RNAseq/shell_scripts/run.RawReadQC.FastQC.SRR1205282.sh; sleep 2
## goal( [ '/CRI/HPC/cri_rnaseq_2018/RNAseq/RawReadQC/KO01/SRR1205282/SRR1205282_fastqc.zip' ] )
This code chunk will invoke the bash script RNAseq/shell_scripts/run.RawReadQC.FastQC.SRR1205282.sh to execute FastQC on the KO01(BK-AA-1_S11_L007_R355) sequencing library.
After the completion of the entire pipeline, you can check FastQC report per individual libraries; for instance, the partial report of KO01 will be as follows or a full report.

You can check FastQC Help for more details about how to interpret a FastQC report.
Or, compare your reports to the example reports provided by FastQC for a Good Illumina Data or Bad Illumina Data.
Step 2.1: Read Alignment | Top
In this step, the pipeline will conduct read alignment on the raw fastq files.
The BDS code snippet for the sample KO01 will look like:
$ grep -A1 run.alignRead.star.SRR1205282.sh Submit_*.bds
## dep( [ '/CRI/HPC/cri_rnaseq_2018/RNAseq/Aln/star/KO01/SRR1205282/SRR1205282.star.bam' ] <- [ '/CRI/HPC/cri_rnaseq_2018/example/data/SRR1205282.fastq.gz' ], cpus := 4, mem := 64*G, timeout := 72*hour, taskName := "star.SRR1205282") sys bash /CRI/HPC/cri_rnaseq_2018/RNAseq/shell_scripts/run.alignRead.star.SRR1205282.sh; sleep 2
## goal( [ '/CRI/HPC/cri_rnaseq_2018/RNAseq/Aln/star/KO01/SRR1205282/SRR1205282.star.bam' ] )
This code chunk will invoke the bash script RNAseq/shell_scripts/run.alignRead.star.SRR1205282.sh to execute STAR on the KO01(SRR1205282) sequencing library.
After the completion of the entire pipeline, you can check the alignment result of each individual libraries; for instance, the result of KO01(SRR1205282) will be as follows.
$ tree RNAseq/Aln/star/KO01/SRR1205282
RNAseq/Aln/star/KO01/SRR1205282
|-- SRR1205282.star.Aligned.sortedByCoord.out.bam
|-- SRR1205282.star.Log.final.out
|-- SRR1205282.star.Log.out
|-- SRR1205282.star.Log.progress.out
|-- SRR1205282.star.SJ.out.tab
|-- SRR1205282.star.Unmapped.out.mate1
|-- SRR1205282.star.bai
|-- SRR1205282.star.bam -> SRR1205282.star.Aligned.sortedByCoord.out.bam
`-- run.alignRead.star.SRR1205282.log
You can check a log file (e.g., RNAseq/Aln/star/KO01/SRR1205282/SRR1205282.star.Log.final.out) for more alignment information provided by STAR.
$ cat RNAseq/Aln/star/KO01/SRR1205282/SRR1205282.star.Log.final.out
## Started job on | Nov 14 10:39:46
## Started mapping on | Nov 14 10:40:01
## Finished on | Nov 14 10:50:31
## Mapping speed, Million of reads per hour | 423.58
##
## Number of input reads | 74126025
## Average input read length | 49
## UNIQUE READS:
## Uniquely mapped reads number | 63912102
## Uniquely mapped reads % | 86.22%
## Average mapped length | 48.82
## Number of splices: Total | 10010945
## Number of splices: Annotated (sjdb) | 9913245
## Number of splices: GT/AG | 9915467
## Number of splices: GC/AG | 77874
## Number of splices: AT/AC | 11550
## Number of splices: Non-canonical | 6054
## Mismatch rate per base, % | 0.25%
## Deletion rate per base | 0.01%
## Deletion average length | 1.43
## Insertion rate per base | 0.00%
## Insertion average length | 1.22
## MULTI-MAPPING READS:
## Number of reads mapped to multiple loci | 0
## % of reads mapped to multiple loci | 0.00%
## Number of reads mapped to too many loci | 9697272
## % of reads mapped to too many loci | 13.08%
## UNMAPPED READS:
## % of reads unmapped: too many mismatches | 0.00%
## % of reads unmapped: too short | 0.44%
## % of reads unmapped: other | 0.26%
## CHIMERIC READS:
## Number of chimeric reads | 0
## % of chimeric reads | 0.00%
Step 2.2: Alignment QC | Top
In this step, the pipeline will conduct a QC on alignment result.
The BDS code snippets for the sample KO01 will look like:
$ grep -A1 run.alnQC.*.star.KO01.*.sh Submit_*.bds
## dep( [ '/CRI/HPC/cri_rnaseq_2018/RNAseq/AlnQC/picard/star/KO01/KO01.star.picard.RNA_Metrics', '/CRI/HPC/cri_rnaseq_2018/RNAseq/AlnQC/picard/star/KO01/KO01.star.picard.RNA_Metrics.pdf' ] <- [ '/CRI/HPC/cri_rnaseq_2018/RNAseq/Aln/star/KO01/KO01.star.bai' ], cpus := 4, mem := 32*G, timeout := 72*hour, taskName := "picard.star.KO01") sys bash /CRI/HPC/cri_rnaseq_2018/RNAseq/shell_scripts/run.alnQC.picard.star.KO01.CollectRnaSeqMetrics.sh; sleep 2
## goal( [ '/CRI/HPC/cri_rnaseq_2018/RNAseq/AlnQC/picard/star/KO01/KO01.star.picard.RNA_Metrics', '/CRI/HPC/cri_rnaseq_2018/RNAseq/AlnQC/picard/star/KO01/KO01.star.picard.RNA_Metrics.pdf' ] )
## --
## dep( [ '/CRI/HPC/cri_rnaseq_2018/RNAseq/AlnQC/rseqc/star/KO01/KO01.star.rseqc.clipping_profile.xls', '/CRI/HPC/cri_rnaseq_2018/RNAseq/AlnQC/rseqc/star/KO01/KO01.star.rseqc.clipping_profile.r', '/CRI/HPC/cri_rnaseq_2018/RNAseq/AlnQC/rseqc/star/KO01/KO01.star.rseqc.clipping_profile.pdf' ] <- [ '/CRI/HPC/cri_rnaseq_2018/RNAseq/Aln/star/KO01/KO01.star.bai' ], cpus := 8, mem := 64*G, timeout := 72*hour, taskName := "rseqc.star.KO01") sys bash /CRI/HPC/cri_rnaseq_2018/RNAseq/shell_scripts/run.alnQC.rseqc.star.KO01.clipping_profile.py.sh; sleep 2
## goal( [ '/CRI/HPC/cri_rnaseq_2018/RNAseq/AlnQC/rseqc/star/KO01/KO01.star.rseqc.clipping_profile.xls', '/CRI/HPC/cri_rnaseq_2018/RNAseq/AlnQC/rseqc/star/KO01/KO01.star.rseqc.clipping_profile.r', '/CRI/HPC/cri_rnaseq_2018/RNAseq/AlnQC/rseqc/star/KO01/KO01.star.rseqc.clipping_profile.pdf' ] )
## --
## dep( [ '/CRI/HPC/cri_rnaseq_2018/RNAseq/AlnQC/rseqc/star/KO01/KO01.star.rseqc.infer_experiment.txt' ] <- [ '/CRI/HPC/cri_rnaseq_2018/RNAseq/Aln/star/KO01/KO01.star.bai' ], cpus := 8, mem := 64*G, timeout := 72*hour, taskName := "rseqc.star.KO01") sys bash /CRI/HPC/cri_rnaseq_2018/RNAseq/shell_scripts/run.alnQC.rseqc.star.KO01.infer_experiment.py.sh; sleep 2
## goal( [ '/CRI/HPC/cri_rnaseq_2018/RNAseq/AlnQC/rseqc/star/KO01/KO01.star.rseqc.infer_experiment.txt' ] )
This code chunk will invoke few bash scripts (e.g., RNAseq/shell_scripts/run.alnQC.picard.star.KO01.CollectRnaSeqMetrics.sh, run.alnQC.rseqc.star.KO01.clipping_profile.py.sh, and run.alnQC.rseqc.star.KO01.infer_experiment.py.sh) to execute alignment QC tools (i.e., Picard and RSeQC) on the sample KO01.
After the completion of the entire pipeline, you can check the alignment QC results of each individual samples; for instance, the results of KO01 will be as follows.
$ tree RNAseq/AlnQC/*/star/KO01
RNAseq/AlnQC/picard/star/KO01
|-- KO01.star.picard.RNA_Metrics
|-- KO01.star.picard.RNA_Metrics.pdf
|-- run.alnQC.picard.star.KO01.CollectRnaSeqMetrics.log
`-- tmp
RNAseq/AlnQC/rseqc/star/KO01
|-- KO01.star.rseqc.clipping_profile.pdf
|-- KO01.star.rseqc.clipping_profile.r
|-- KO01.star.rseqc.clipping_profile.xls
|-- KO01.star.rseqc.infer_experiment.txt
|-- run.alnQC.rseqc.star.KO01.clipping_profile.py.log
`-- tmp
You can check alignment statistics (e.g., RNAseq/AlnQC/picard/star/KO01/KO01.star.picard.RNA_Metrics) for more information provided by Picard.
$ head RNAseq/AlnQC/picard/star/KO01/KO01.star.picard.RNA_Metrics
## ## htsjdk.samtools.metrics.StringHeader
## # picard.analysis.CollectRnaSeqMetrics REF_FLAT=/CRI/HPC/cri_rnaseq_2018/example/references/v28_92_GRCh38.p12/genes.refFlat.txt RIBOSOMAL_INTERVALS=/CRI/HPC/cri_rnaseq_2018/example/references/v28_92_GRCh38.p12/GRCh38_rRNA.bed.interval_list STRAND_SPECIFICITY=NONE CHART_OUTPUT=/CRI/HPC/cri_rnaseq_2018/RNAseq/AlnQC/picard/star/KO01/KO01.star.picard.RNA_Metrics.pdf METRIC_ACCUMULATION_LEVEL=[SAMPLE, ALL_READS] INPUT=/CRI/HPC/cri_rnaseq_2018/RNAseq/Aln/star/KO01/KO01.star.bam OUTPUT=/CRI/HPC/cri_rnaseq_2018/RNAseq/AlnQC/picard/star/KO01/KO01.star.picard.RNA_Metrics TMP_DIR=[/CRI/HPC/cri_rnaseq_2018/RNAseq/AlnQC/picard/star/KO01/tmp] MINIMUM_LENGTH=500 RRNA_FRAGMENT_PERCENTAGE=0.8 ASSUME_SORTED=true STOP_AFTER=0 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
## ## htsjdk.samtools.metrics.StringHeader
## # Started on: Wed Nov 14 10:53:18 CST 2018
##
## ## METRICS CLASS picard.analysis.RnaSeqMetrics
## PF_BASES PF_ALIGNED_BASES RIBOSOMAL_BASES CODING_BASES UTR_BASES INTRONIC_BASES INTERGENIC_BASES IGNORED_READS CORRECT_STRAND_READS INCORRECT_STRAND_READS PCT_RIBOSOMAL_BASES PCT_CODING_BASES PCT_UTR_BASES PCT_INTRONIC_BASES PCT_INTERGENIC_BASES PCT_MRNA_BASES PCT_USABLE_BASES PCT_CORRECT_STRAND_READS MEDIAN_CV_COVERAGE MEDIAN_5PRIME_BIAS MEDIAN_3PRIME_BIAS MEDIAN_5PRIME_TO_3PRIME_BIAS SAMPLE LIBRARY READ_GROUP
## 3131692998 3120045478 1072267 1902187708 1009131413 164870020 42790598 0 0 0 0.000344 0.609667 0.323435 0.052842 0.013715 0.933102 0.929631 0 0.783501 0.448148 0.690783 0.59721
## 3131692998 3120045478 1072267 1902187708 1009131413 164870020 42790598 0 0 0 0.000344 0.609667 0.323435 0.052842 0.013715 0.933102 0.929631 0 0.783501 0.448148 0.690783 0.59721 unknown
Or, the respective coverage plot of the sample KO01 produced by Picard will be as follows.

There are two alignment measurements performed using RSeQC.
- clipping_profile.py
- Calculate the distributions of clipped nucleotides across reads
- infer_experiment.py
- Use to “guess” how RNA-seq sequencing was configured, particularly how reads were stranded for strand-specific RNA-seq data, through comparing the “strandedness of reads” with the “strandedness of transcripts”.
The results will be as follows. Please check the RSeQC website for more measurements and details.

$cat RNAseq/AlnQC/rseqc/star/KO01/KO01.star.rseqc.infer_experiment.txt
##
##
## This is SingleEnd Data
## Fraction of reads failed to determine: 0.2143
## Fraction of reads explained by "++,--": 0.4025
## Fraction of reads explained by "+-,-+": 0.3832
Here, you can confirm again the sample KO01 is a single-end, strand-specific library using the second-strand synthesis.
Step 3: Expression Quantification | Top
In this step, the pipeline will conduct expression quantification over alignments.
The BDS code snippet for the sample KO01 will look like:
$ grep -A1 run.quant.featurecounts.star.KO01.sh Submit_*.bds
## dep( [ '/CRI/HPC/cri_rnaseq_2018/RNAseq/Quantification/featurecounts/star/KO01/KO01.star.featurecounts.count' ] <- [ '/CRI/HPC/cri_rnaseq_2018/RNAseq/Aln/star/KO01/KO01.star.bai' ], cpus := 4, mem := 32*G, timeout := 72*hour, taskName := "featurecounts.star.KO01") sys bash /CRI/HPC/cri_rnaseq_2018/RNAseq/shell_scripts/run.quant.featurecounts.star.KO01.sh; sleep 2
## goal( [ '/CRI/HPC/cri_rnaseq_2018/RNAseq/Quantification/featurecounts/star/KO01/KO01.star.featurecounts.count' ] )
This code chunk will invoke the bash script (e.g., RNAseq/shell_scripts/run.quant.featurecounts.star.KO01.sh) to execute expression quantification tool (i.e., Subread::featureCounts on the sample KO01.
After the completion of the entire pipeline, you can check the quantification results of each individual samples; for instance, the results of KO01 will be as follows.
$ tree RNAseq/Quantification/featurecounts/star/KO01
RNAseq/Quantification/featurecounts/star/KO01
|-- KO01.star.featurecounts.count
|-- KO01.star.featurecounts.count.jcounts
|-- KO01.star.featurecounts.count.summary
`-- run.quant.featurecounts.star.KO01.log
You can check quantification statistics (e.g., RNAseq/Quantification/featurecounts/star/KO01KO01.star.featurecounts.count.summary) for more information provided by Subread::featureCounts
$ cat RNAseq/Quantification/featurecounts/star/KO01/KO01.star.featurecounts.count.summary
## Status /CRI/HPC/cri_rnaseq_2018/RNAseq/Aln/star/KO01/KO01.star.bam
## Assigned 55814495
## Unassigned_Unmapped 0
## Unassigned_MappingQuality 0
## Unassigned_Chimera 0
## Unassigned_FragmentLength 0
## Unassigned_Duplicate 0
## Unassigned_MultiMapping 0
## Unassigned_Secondary 0
## Unassigned_Nonjunction 0
## Unassigned_NoFeatures 4118361
## Unassigned_Overlapping_Length 0
## Unassigned_Ambiguity 3979246
Or, the top 10 most abundant genes in the sample KO01 will be as follows.
$ cat <(head -n2 RNAseq/Quantification/featurecounts/star/KO01/KO01.star.featurecounts.count | tail -n+2 | cut -f1,7) <(cut -f1,7 RNAseq/Quantification/featurecounts/star/KO01/KO01.star.featurecounts.count | sort -k2,2nr | head)
Top 10 most abundant genes in KO01
|
Geneid
|
Chr
|
Start
|
End
|
Strand
|
Length
|
KO01
|
|
ENSG00000198886.2
|
chrM
|
10760
|
12137
|
|
1378
|
655558
|
|
ENSG00000211899.9
|
chr14
|
105851708
|
105856218
|
|
1868
|
523725
|
|
ENSG00000210082.2
|
chrM
|
1671
|
3229
|
|
1559
|
428124
|
|
ENSG00000198804.2
|
chrM
|
5904
|
7445
|
|
1542
|
420494
|
|
ENSG00000167658.15
|
chr19
|
3976056
|
3985469
|
|
4027
|
339682
|
|
ENSG00000198786.2
|
chrM
|
12337
|
14148
|
|
1812
|
330684
|
|
ENSG00000198727.2
|
chrM
|
14747
|
15887
|
|
1141
|
299516
|
|
ENSG00000075624.14
|
chr7
|
5527147
|
5563784
|
|
3519
|
288961
|
|
ENSG00000198712.1
|
chrM
|
7586
|
8269
|
|
684
|
255987
|
|
ENSG00000074695.5
|
chr18
|
59327823
|
59359962
|
|
6767
|
229511
|
Step 4-1: Identify Differentially Expressed Genes (DEGs) | Top
In this step, the pipeline will identify differentially expressed genes (DEG) according to the alignment result files (i.e., BAM files) after the alignment step.
The BDS code snippets for the example dataset will look like:
$ grep -A1 run.call.*.featurecounts.star.*.sh Submit_*.bds
## dep( [ '/CRI/HPC/cri_rnaseq_2018/RNAseq/DEG/edger/featurecounts/star/cri_rnaseq_2018.star.featurecounts.edger.count.txt', '/CRI/HPC/cri_rnaseq_2018/RNAseq/DEG/edger/featurecounts/star/cri_rnaseq_2018.star.featurecounts.edger.test.DEG.txt' ] <- [ '/CRI/HPC/cri_rnaseq_2018/RNAseq/Quantification/featurecounts/star/KO01/KO01.star.featurecounts.count', '/CRI/HPC/cri_rnaseq_2018/RNAseq/Quantification/featurecounts/star/KO02/KO02.star.featurecounts.count', '/CRI/HPC/cri_rnaseq_2018/RNAseq/Quantification/featurecounts/star/KO03/KO03.star.featurecounts.count', '/CRI/HPC/cri_rnaseq_2018/RNAseq/Quantification/featurecounts/star/WT01/WT01.star.featurecounts.count', '/CRI/HPC/cri_rnaseq_2018/RNAseq/Quantification/featurecounts/star/WT02/WT02.star.featurecounts.count', '/CRI/HPC/cri_rnaseq_2018/RNAseq/Quantification/featurecounts/star/WT03/WT03.star.featurecounts.count' ], cpus := 4, mem := 32*G, timeout := 72*hour, taskName := "edger.featurecounts.star.cri_rnaseq_2018") sys bash /CRI/HPC/cri_rnaseq_2018/RNAseq/shell_scripts/run.call.edger.featurecounts.star.cri_rnaseq_2018.sh; sleep 2
## goal( [ '/CRI/HPC/cri_rnaseq_2018/RNAseq/DEG/edger/featurecounts/star/cri_rnaseq_2018.star.featurecounts.edger.count.txt' ] )
## --
## dep( [ '/CRI/HPC/cri_rnaseq_2018/RNAseq/DEG/deseq2/featurecounts/star/cri_rnaseq_2018.star.featurecounts.deseq2.count.txt', '/CRI/HPC/cri_rnaseq_2018/RNAseq/DEG/deseq2/featurecounts/star/cri_rnaseq_2018.star.featurecounts.deseq2.test.DEG.txt' ] <- [ '/CRI/HPC/cri_rnaseq_2018/RNAseq/Quantification/featurecounts/star/KO01/KO01.star.featurecounts.count', '/CRI/HPC/cri_rnaseq_2018/RNAseq/Quantification/featurecounts/star/KO02/KO02.star.featurecounts.count', '/CRI/HPC/cri_rnaseq_2018/RNAseq/Quantification/featurecounts/star/KO03/KO03.star.featurecounts.count', '/CRI/HPC/cri_rnaseq_2018/RNAseq/Quantification/featurecounts/star/WT01/WT01.star.featurecounts.count', '/CRI/HPC/cri_rnaseq_2018/RNAseq/Quantification/featurecounts/star/WT02/WT02.star.featurecounts.count', '/CRI/HPC/cri_rnaseq_2018/RNAseq/Quantification/featurecounts/star/WT03/WT03.star.featurecounts.count' ], cpus := 4, mem := 32*G, timeout := 72*hour, taskName := "deseq2.featurecounts.star.cri_rnaseq_2018") sys bash /CRI/HPC/cri_rnaseq_2018/RNAseq/shell_scripts/run.call.deseq2.featurecounts.star.cri_rnaseq_2018.sh; sleep 2
## goal( [ '/CRI/HPC/cri_rnaseq_2018/RNAseq/DEG/deseq2/featurecounts/star/cri_rnaseq_2018.star.featurecounts.deseq2.count.txt' ] )
## --
## dep( [ '/CRI/HPC/cri_rnaseq_2018/RNAseq/DEG/limma/featurecounts/star/cri_rnaseq_2018.star.featurecounts.limma.count.txt', '/CRI/HPC/cri_rnaseq_2018/RNAseq/DEG/limma/featurecounts/star/cri_rnaseq_2018.star.featurecounts.limma.test.DEG.txt' ] <- [ '/CRI/HPC/cri_rnaseq_2018/RNAseq/Quantification/featurecounts/star/KO01/KO01.star.featurecounts.count', '/CRI/HPC/cri_rnaseq_2018/RNAseq/Quantification/featurecounts/star/KO02/KO02.star.featurecounts.count', '/CRI/HPC/cri_rnaseq_2018/RNAseq/Quantification/featurecounts/star/KO03/KO03.star.featurecounts.count', '/CRI/HPC/cri_rnaseq_2018/RNAseq/Quantification/featurecounts/star/WT01/WT01.star.featurecounts.count', '/CRI/HPC/cri_rnaseq_2018/RNAseq/Quantification/featurecounts/star/WT02/WT02.star.featurecounts.count', '/CRI/HPC/cri_rnaseq_2018/RNAseq/Quantification/featurecounts/star/WT03/WT03.star.featurecounts.count' ], cpus := 4, mem := 32*G, timeout := 72*hour, taskName := "limma.featurecounts.star.cri_rnaseq_2018") sys bash /CRI/HPC/cri_rnaseq_2018/RNAseq/shell_scripts/run.call.limma.featurecounts.star.cri_rnaseq_2018.sh; sleep 2
## goal( [ '/CRI/HPC/cri_rnaseq_2018/RNAseq/DEG/limma/featurecounts/star/cri_rnaseq_2018.star.featurecounts.limma.count.txt' ] )
This code chunk will invoke few bash scripts (e.g., RNAseq/shell_scripts/run.call.edger.featurecounts.star.cri_rnaseq_2018.sh, run.call.deseq2.featurecounts.star.cri_rnaseq_2018.sh, and run.call.limma.featurecounts.star.cri_rnaseq_2018.sh) to execute differential expression (DE) analysis using three the state-of-the-art tools (i.e., edgeR, DESeq2, and limma) on the example dataset of six samples from KO01 to WT03.
There are three DE analysis tools used in the current pipeline, including
- edgeR: Empirical Analysis of Digital Gene Expression Data in R
- DESeq2: Differential gene expression analysis based on the negative binomial distribution
- limma: Linear Models for Microarray Data
After the completion of the entire pipeline, you can check the calling results of each individual methods; for instance, the analysis results of the example dataset will be as follows.
$ tree RNAseq/DEG/*/featurecounts/star/
RNAseq/DEG/deseq2/featurecounts/star/
|-- cri_rnaseq_2018.star.featurecounts.deseq2.RData
|-- cri_rnaseq_2018.star.featurecounts.deseq2.count.ntd.meanSdPlot.pdf
|-- cri_rnaseq_2018.star.featurecounts.deseq2.count.ntd.txt
|-- cri_rnaseq_2018.star.featurecounts.deseq2.count.rld.meanSdPlot.pdf
|-- cri_rnaseq_2018.star.featurecounts.deseq2.count.rld.txt
|-- cri_rnaseq_2018.star.featurecounts.deseq2.count.txt
|-- cri_rnaseq_2018.star.featurecounts.deseq2.count.vst.meanSdPlot.pdf
|-- cri_rnaseq_2018.star.featurecounts.deseq2.count.vst.txt
|-- cri_rnaseq_2018.star.featurecounts.deseq2.plotDispEsts.pdf
|-- cri_rnaseq_2018.star.featurecounts.deseq2.plotMA.pdf
|-- cri_rnaseq_2018.star.featurecounts.deseq2.test.DEG.txt
|-- cri_rnaseq_2018.star.featurecounts.deseq2.test.txt
`-- run.call.deseq2.featurecounts.star.cri_rnaseq_2018.log
RNAseq/DEG/edger/featurecounts/star/
|-- cri_rnaseq_2018.star.featurecounts.edger.RData
|-- cri_rnaseq_2018.star.featurecounts.edger.count.txt
|-- cri_rnaseq_2018.star.featurecounts.edger.plotBCV.pdf
|-- cri_rnaseq_2018.star.featurecounts.edger.plotMA.pdf
|-- cri_rnaseq_2018.star.featurecounts.edger.plotSmear.pdf
|-- cri_rnaseq_2018.star.featurecounts.edger.test.DEG.txt
|-- cri_rnaseq_2018.star.featurecounts.edger.test.txt
`-- run.call.edger.featurecounts.star.cri_rnaseq_2018.log
RNAseq/DEG/limma/featurecounts/star/
|-- cri_rnaseq_2018.star.featurecounts.limma.RData
|-- cri_rnaseq_2018.star.featurecounts.limma.count.txt
|-- cri_rnaseq_2018.star.featurecounts.limma.count.voom.meanSdPlot.pdf
|-- cri_rnaseq_2018.star.featurecounts.limma.plotMA.pdf
|-- cri_rnaseq_2018.star.featurecounts.limma.test.DEG.txt
|-- cri_rnaseq_2018.star.featurecounts.limma.test.txt
|-- cri_rnaseq_2018.star.featurecounts.limma.voom.mean-variance.pdf
`-- run.call.limma.featurecounts.star.cri_rnaseq_2018.log
You can check statistical test results per gene (e.g., RNAseq/DEG/deseq2/featurecounts/star/cri_rnaseq_2018.star.featurecounts.deseq2.test.txt) for more information generated by each method.
$ head RNAseq/DEG/deseq2/featurecounts/star/cri_rnaseq_2018.star.featurecounts.deseq2.test.txt
Statistical test result per gene using DESeq2
|
|
Geneid
|
baseMean
|
log2FoldChange
|
lfcSE
|
stat
|
pvalue
|
FDR
|
DEG
|
|
4249
|
ENSG00000120738.7
|
5193.2844
|
1.960273
|
0.1147987
|
17.07574
|
0
|
0
|
1
|
|
9872
|
ENSG00000170345.9
|
422.7627
|
1.960006
|
0.1246756
|
15.72085
|
0
|
0
|
1
|
|
4273
|
ENSG00000113070.7
|
1441.0826
|
1.386086
|
0.1041939
|
13.30296
|
0
|
0
|
1
|
|
9601
|
ENSG00000100867.14
|
356.3080
|
1.707939
|
0.1339178
|
12.75364
|
0
|
0
|
1
|
|
8830
|
ENSG00000229117.8
|
7121.3363
|
-1.270634
|
0.1061629
|
-11.96872
|
0
|
0
|
-1
|
|
7342
|
ENSG00000035403.17
|
855.0647
|
-1.158327
|
0.1024377
|
-11.30763
|
0
|
0
|
-1
|
|
587
|
ENSG00000177606.6
|
556.6779
|
1.267243
|
0.1122316
|
11.29132
|
0
|
0
|
1
|
|
4407
|
ENSG00000038274.16
|
3176.8050
|
-1.277175
|
0.1135558
|
-11.24711
|
0
|
0
|
-1
|
|
7462
|
ENSG00000099194.5
|
27347.1594
|
-1.157671
|
0.1036466
|
-11.16941
|
0
|
0
|
-1
|
|
734
|
ENSG00000134215.15
|
2470.3706
|
1.200234
|
0.1078677
|
11.12691
|
0
|
0
|
1
|
Or, the exploratory plots of the example dataset produced by DESeq2 will be as follows.
- An exploratory plot of the per-gene dispersion estimates together with the fitted mean-dispersion relationship
- An exploratory plot of row standard deviations versus row means using the normalized counts transformation (f(count + pc))
- An exploratory plot of row standard deviations versus row means using the variance stabilizing transformation
- An exploratory plot of row standard deviations versus row means using the ‘regularized log’ transformation
- A MA plot of log2 fold changes (on the y-axis) versus the mean of normalized counts (on the x-axis)
- A scatter plot of log2 fold changes (on the y-axis) versus the FDR (on the x-axis)
Step 4-2: DEG Statistics | Top
In this step, the pipeline will collect DEG statistics and identify the overlapping set of identified DEGs from the previous methods.
The BDS code snippet for the sample KO01 will look like:
$ grep -A1 run.lociStat.featurecounts.star.*.sh Submit_*.bds
## dep( [ '/CRI/HPC/cri_rnaseq_2018/RNAseq/LociStat/featurecounts/star/cri_rnaseq_2018/cri_rnaseq_2018.star.featurecounts.overlap.txt' ] <- [ '/CRI/HPC/cri_rnaseq_2018/RNAseq/DEG/edger/featurecounts/star/cri_rnaseq_2018.star.featurecounts.edger.test.DEG.txt', '/CRI/HPC/cri_rnaseq_2018/RNAseq/DEG/deseq2/featurecounts/star/cri_rnaseq_2018.star.featurecounts.deseq2.test.DEG.txt', '/CRI/HPC/cri_rnaseq_2018/RNAseq/DEG/limma/featurecounts/star/cri_rnaseq_2018.star.featurecounts.limma.test.DEG.txt' ], cpus := 4, mem := 32*G, timeout := 72*hour, taskName := "lociStat.featurecounts.star.cri_rnaseq_2018") sys bash /CRI/HPC/cri_rnaseq_2018/RNAseq/shell_scripts/run.lociStat.featurecounts.star.cri_rnaseq_2018.sh; sleep 2
## goal( [ '/CRI/HPC/cri_rnaseq_2018/RNAseq/LociStat/featurecounts/star/cri_rnaseq_2018/cri_rnaseq_2018.star.featurecounts.overlap.txt' ] )
This code chunk will invoke the bash script (e.g., RNAseq/shell_scripts/run.lociStat.featurecounts.star.cri_rnaseq_2018.sh) to collect DEG statistics and to make a Venn diagram plot.
After the completion of the entire pipeline, you can check the statistics result of DEGs per method; for instance, the example dataset DLBC will be as follows.
$ grep -A5 'Up/Down regulated DEGs per methods' RNAseq/LociStat/featurecounts/star/*/run.lociStat.featurecounts.star.*.log | tail -n+2
## grep: result/run.lociStat.featurecounts.star.*.log: No such file or directory
$ cut -f1,2,4 RNAseq/LociStat/featurecounts/star/*/*.star.featurecounts.VennList.txt
DEG Statistics
|
Methods
|
Method.Num
|
ID.Num
|
|
edger
|
1
|
40
|
|
deseq2
|
1
|
14
|
|
limma
|
1
|
45
|
|
edger&deseq2
|
2
|
85
|
|
edger&limma
|
2
|
86
|
|
deseq2&limma
|
2
|
12
|
|
edger&deseq2&limma
|
3
|
660
|
There is a Venn diagram plot will be generated after this step.

Step 5: Sample Correlation | Top
In this step, the pipeline will make a PCA plot based on the transcriptional profiling of all samples.
The BDS code snippet for the sample KO01 will look like:
$ grep -A1 run.quantQC.pca.featurecounts.star.*.sh Submit_*.bds
## dep( [ '/CRI/HPC/cri_rnaseq_2018/RNAseq/QuantQC/featurecounts/star/cri_rnaseq_2018.star.featurecounts.pca.pdf' ] <- [ '/CRI/HPC/cri_rnaseq_2018/RNAseq/DEG/deseq2/featurecounts/star/cri_rnaseq_2018.star.featurecounts.deseq2.count.txt' ], cpus := 8, mem := 64*G, timeout := 72*hour, taskName := "pca.featurecounts.star.cri_rnaseq_2018") sys bash /CRI/HPC/cri_rnaseq_2018/RNAseq/shell_scripts/run.quantQC.pca.featurecounts.star.cri_rnaseq_2018.sh; sleep 2
## goal( [ '/CRI/HPC/cri_rnaseq_2018/RNAseq/QuantQC/featurecounts/star/cri_rnaseq_2018.star.featurecounts.pca.pdf' ] )
This code chunk will invoke the bash script (e.g., RNAseq/shell_scripts/run.quantQC.pca.featurecounts.star.cri_rnaseq_2018.sh) to make a PCA plot based on the alignment quantification result generated by DESeq2 or one of DE analysis tools.
After the completion of the entire pipeline, you can check the PCA plot under the folder of QuantQC/.

Step 6: Heat Map | Top
In this step, the pipeline will make a heat map based on the overlapping set of DEGs identified across different DE analysis tools.
The BDS code snippet for the sample KO01 will look like:
$ grep -A1 run.postAna.pheatmap.featurecounts.star.*.sh Submit_*.bds
## dep( [ '/CRI/HPC/cri_rnaseq_2018/RNAseq/PostAna/pheatmap/featurecounts/star/cri_rnaseq_2018/cri_rnaseq_2018.star.featurecounts.heatmap.pdf' ] <- [ '/CRI/HPC/cri_rnaseq_2018/RNAseq/LociStat/featurecounts/star/cri_rnaseq_2018/cri_rnaseq_2018.star.featurecounts.overlap.txt' ], cpus := 4, mem := 32*G, timeout := 72*hour, taskName := "postAna.pheatmap.featurecounts.star.cri_rnaseq_2018") sys bash /CRI/HPC/cri_rnaseq_2018/RNAseq/shell_scripts/run.postAna.pheatmap.featurecounts.star.cri_rnaseq_2018.sh; sleep 2
## goal( [ '/CRI/HPC/cri_rnaseq_2018/RNAseq/PostAna/pheatmap/featurecounts/star/cri_rnaseq_2018/cri_rnaseq_2018.star.featurecounts.heatmap.pdf' ] )
This code chunk will invoke the bash script (e.g., RNAseq/shell_scripts/run.postAna.pheatmap.featurecounts.star.cri_rnaseq_2018.sh) to make a heat map plot based on the overlapping set of DEGs identified across different DE analysis tools (i.e., edgeR, DESeq2, and limma).
After the completion of the entire pipeline, you can check the heat map under the folder of PostAna/pheatmap.

Step 7: Functional Enrichment Analysis | Top
In this step, the pipeline will conduct enrichment analysis and make several exploratory plots based on the overlapping set of DEGs identified across different DE analysis tools.
The BDS code snippet for the projet DLBC will look like:
$ grep -A1 run.postAna.clusterprofiler.featurecounts.star.*.sh Submit_*.bds
## dep( [ '/CRI/HPC/cri_rnaseq_2018/RNAseq/PostAna/clusterprofiler/featurecounts/star/cri_rnaseq_2018/cri_rnaseq_2018.star.featurecounts.enrichGO.ALL.txt' ] <- [ '/CRI/HPC/cri_rnaseq_2018/RNAseq/LociStat/featurecounts/star/cri_rnaseq_2018/cri_rnaseq_2018.star.featurecounts.overlap.txt' ], cpus := 4, mem := 32*G, timeout := 72*hour, taskName := "postAna.clusterprofiler.featurecounts.star.cri_rnaseq_2018") sys bash /CRI/HPC/cri_rnaseq_2018/RNAseq/shell_scripts/run.postAna.clusterprofiler.featurecounts.star.cri_rnaseq_2018.sh; sleep 2
## goal( [ '/CRI/HPC/cri_rnaseq_2018/RNAseq/PostAna/clusterprofiler/featurecounts/star/cri_rnaseq_2018/cri_rnaseq_2018.star.featurecounts.enrichGO.ALL.txt' ] )
This code chunk will invoke the bash script (e.g., RNAseq/shell_scripts/run.postAna.clusterprofiler.featurecounts.star.cri_rnaseq_2018.sh) to conduct enrichment analyses including GO and KEGG pathway erichment analyses as well as gene set enrichment analysis (GSEA).
After the completion of the entire pipeline, you can check the heat map under the folder of PostAna/pheatmap.
$ tree RNAseq/PostAna/clusterprofiler/featurecounts/star/*/
RNAseq/PostAna/clusterprofiler/featurecounts/star/cri_rnaseq_2018/
|-- cri_rnaseq_2018.star.featurecounts.enrichGO.ALL.cnetplot.pdf
|-- cri_rnaseq_2018.star.featurecounts.enrichGO.ALL.dotplot.pdf
|-- cri_rnaseq_2018.star.featurecounts.enrichGO.ALL.emapplot.pdf
|-- cri_rnaseq_2018.star.featurecounts.enrichGO.ALL.txt
|-- cri_rnaseq_2018.star.featurecounts.enrichGSEAGO.ALL.neg001.pdf
|-- cri_rnaseq_2018.star.featurecounts.enrichGSEAGO.ALL.pos001.pdf
|-- cri_rnaseq_2018.star.featurecounts.enrichGSEAGO.ALL.txt
|-- cri_rnaseq_2018.star.featurecounts.enrichGSEAKEGG.pos001.pdf
|-- cri_rnaseq_2018.star.featurecounts.enrichGSEAKEGG.txt
|-- cri_rnaseq_2018.star.featurecounts.enrichKEGG.cnetplot.pdf
|-- cri_rnaseq_2018.star.featurecounts.enrichKEGG.dotplot.pdf
|-- cri_rnaseq_2018.star.featurecounts.enrichKEGG.emapplot.pdf
|-- cri_rnaseq_2018.star.featurecounts.enrichKEGG.txt
`-- run.postAna.clusterprofiler.featurecounts.star.cri_rnaseq_2018.log
Below, several plots will be generated based on the overlapping set of DEGs using GO database as example.
- An exploratory dot plot for enrichment result
- An exploratory enrichment map for enrichment result of over-representation test
- An exploratory Gene-Concept Network plot of over-representation test
- An exploratory plot of GSEA result with NES great than zero
- An exploratory plot of GSEA result with NES less than zero